Efficient exploration with Double Uncertain Value Networks
نویسندگان
چکیده
This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.
منابع مشابه
SEISMIC DESIGN OF DOUBLE LAYER GRIDS BY NEURAL NETWORKS
The main contribution of the present paper is to train efficient neural networks for seismic design of double layer grids subject to multiple-earthquake loading. As the seismic analysis and design of such large scale structures require high computational efforts, employing neural network techniques substantially decreases the computational burden. Square-on-square double layer grids with the va...
متن کاملA Novel Hybrid Modified Binary Particle Swarm Optimization Algorithm for the Uncertain p-Median Location Problem
Here, we investigate the classical p-median location problem on a network in which the vertex weights and the distances between vertices are uncertain. We propose a programming model for the uncertain p-median location problem with tail value at risk objective. Then, we show that it is NP-hard. Therefore, a novel hybrid modified binary particle swarm optimization algorithm is presented to obtai...
متن کاملConstrained Consumable Resource Allocation in Uncertain Project Networks with Fuzzy Activity Duration
متن کامل
FINITE-TIME PASSIVITY OF DISCRETE-TIME T-S FUZZY NEURAL NETWORKS WITH TIME-VARYING DELAYS
This paper focuses on the problem of finite-time boundedness and finite-time passivity of discrete-time T-S fuzzy neural networks with time-varying delays. A suitable Lyapunov--Krasovskii functional(LKF) is established to derive sufficient condition for finite-time passivity of discrete-time T-S fuzzy neural networks. The dynamical system is transformed into a T-S fuzzy model with uncertain par...
متن کاملTwo Comprehensive Strategies to Prioritize the Capacity Improvement Solutions in Railway Networks (Case Study: Iran)
The aim of this study is to present two comprehensive strategies for prioritizing the capacity improvement solutions in the railway networks. The solutions considered in this study include: promoting to double-track railways, block signaling system, electrification and re-opening the closed stations. The first strategy is based on a local approach, which concentrates on the critical block secti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1711.10789 شماره
صفحات -
تاریخ انتشار 2017